NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Assessing the Adequacy of Morphological Models Using Posterior Predictive Simulations

https://doi.org/10.1093/sysbio/syae055

Mulvey, Laura P_A; May, Michael R; Brown, Jeremy M; Höhna, Sebastian; Wright, April M; Warnock, Rachel C_M (October 2024, Systematic Biology)
Klopfstein, Seraina (Ed.)
Abstract Reconstructing the evolutionary history of different groups of organisms provides insight into how life originated and diversified on Earth. Phylogenetic trees are commonly used to estimate this evolutionary history. Within Bayesian phylogenetics a major step in estimating a tree is in choosing an appropriate model of character evolution. While the most common character data used is molecular sequence data, morphological data remains a vital source of information. The use of morphological characters allows for the incorporation fossil taxa, and despite advances in molecular sequencing, continues to play a significant role in neontology. Moreover, it is the main data source that allows us to unite extinct and extant taxa directly under the same generating process. We therefore require suitable models of morphological character evolution, the most common being the Mk Lewis model. While it is frequently used in both palaeobiology and neontology, it is not known whether the simple Mk substitution model, or any extensions to it, provide a sufficiently good description of the process of morphological evolution. In this study we investigate the impact of different morphological models on empirical tetrapod datasets. Specifically, we compare unpartitioned Mk models with those where characters are partitioned by the number of observed states, both with and without allowing for rate variation across sites and accounting for ascertainment bias. We show that the choice of substitution model has an impact on both topology and branch lengths, highlighting the importance of model choice. Through simulations, we validate the use of the model adequacy approach, posterior predictive simulations, for choosing an appropriate model. Additionally, we compare the performance of model adequacy with Bayesian model selection. We demonstrate how model selection approaches based on marginal likelihoods are not appropriate for choosing between models with partition schemes that vary in character state space (i.e., that vary in Q-matrix state size). Using posterior predictive simulations, we found that current variations of the Mk model are often performing adequately in capturing the evolutionary dynamics that generated our data. We do not find any preference for a particular model extension across multiple datasets, indicating that there is no “one size fits all” when it comes to morphological data and that careful consideration should be given to choosing models of discrete character evolution. By using suitable models of character evolution, we can increase our confidence in our phylogenetic estimates, which should in turn allow us to gain more accurate insights into the evolutionary history of both extinct and extant taxa.
more » « less
Full Text Available
The Expected Behaviors of Posterior Predictive Tests and Their Unexpected Interpretation

https://doi.org/10.1093/molbev/msae051

Fabreti, Luiza Guimarães; Coghill, Lyndon M; Thomson, Robert C; Höhna, Sebastian; Brown, Jeremy M (March 2024, Molecular Biology and Evolution)
Pupko, Tal (Ed.)
Abstract Poor fit between models of sequence or trait evolution and empirical data is known to cause biases and lead to spurious conclusions about evolutionary patterns and processes. Bayesian posterior prediction is a flexible and intuitive approach for detecting such cases of poor fit. However, the expected behavior of posterior predictive tests has never been characterized for evolutionary models, which is critical for their proper interpretation. Here, we show that the expected distribution of posterior predictive P-values is generally not uniform, in contrast to frequentist P-values used for hypothesis testing, and extreme posterior predictive P-values often provide more evidence of poor fit than typically appreciated. Posterior prediction assesses model adequacy under highly favorable circumstances, because the model is fitted to the data, which leads to expected distributions that are often concentrated around intermediate values. Nonuniform expected distributions of P-values do not pose a problem for the application of these tests, however, and posterior predictive P-values can be interpreted as the posterior probability that the fitted model would predict a dataset with a test statistic value as extreme as the value calculated from the observed data.
more » « less
Full Text Available
On the Need for New Measures of Phylogenomic Support

https://doi.org/10.1093/sysbio/syac002

Thomson, Robert C; Brown, Jeremy M (February 2022, Systematic Biology)
Carstens, Bryan (Ed.)
Abstract The scale of data sets used to infer phylogenies has grown dramatically in the last decades, providing researchers with an enormous amount of information with which to draw inferences about evolutionary history. However, standard approaches to assessing confidence in those inferences (e.g., nonparametric bootstrap proportions [BP] and Bayesian posterior probabilities [PPs]) are still deeply influenced by statistical procedures and frameworks that were developed when information was much more limited. These approaches largely quantify uncertainty caused by limited amounts of data, which is often vanishingly small with modern, genome-scale sequence data sets. As a consequence, today’s phylogenomic studies routinely report near-complete confidence in their inferences, even when different studies reach strongly conflicting conclusions and the sites and loci in a single data set contain much more heterogeneity than our methods assume or can accommodate. Therefore, we argue that BPs and marginal PPs of bipartitions have outlived their utility as the primary means of measuring phylogenetic support for modern phylogenomic data sets with large numbers of sites relative to the number of taxa. Continuing to rely on these measures will hinder progress towards understanding remaining sources of uncertainty in the most challenging portions of the Tree of Life. Instead, we encourage researchers to examine the ideas and methods presented in this special issue of Systematic Biology and to explore the area further in their own work. The papers in this special issue outline strategies for assessing confidence and uncertainty in phylogenomic data sets that move beyond stochastic error due to limited data and offer promise for more productive dialogue about the challenges that we face in reaching our shared goal of understanding the history of life on Earth.[Big data; gene tree variation; genomic era; statistical bias.]
more » « less
Full Text Available
Comparing Likelihood Ratios to Understand Genome-Wide Variation in Phylogenetic Support

https://doi.org/10.1093/sysbio/syac014

Mount, Genevieve G.; Brown, Jeremy M.; Jermiin, ed., Lars (March 2022, Systematic Biology)

Abstract Genomic data have only sometimes brought resolution to the tree of life. Large phylogenomic studies can reach conflicting conclusions about important relationships, with mutually exclusive hypotheses receiving strong support. Reconciling such differences requires a detailed understanding of how phylogenetic signal varies among data sets. Two complementary strategies for better understanding phylogenomic conflicts are to examine support on a locus-by-locus basis and use support values that capture a larger range of variation in phylogenetic information, such as likelihood ratios. Likelihood ratios can be calculated using either maximum or marginal likelihoods. Despite being conceptually similar, differences in how these ratios are calculated and interpreted have not been closely examined in phylogenomics. Here, we compare the behavior of maximum and marginal likelihood ratios when evaluating alternate resolutions of recalcitrant relationships among major squamate lineages. We find that these ratios are broadly correlated between loci, but the correlation is driven by extreme values. As a consequence, the proportion of loci that support a hypothesis can change depending on which ratio is used and whether smaller values are discarded. In addition, maximum likelihood ratios frequently exhibit identical support for alternate hypotheses, making conflict resolution a challenge. We find surprising support for a sister relationship between snakes and iguanians across four different phylogenomic data sets in contrast to previous empirical studies. [Bayes factors; likelihood ratios; marginal likelihood; maximum likelihood; phylogenomics; squamates.]
more » « less
Investigating the Genomic Distribution of Phylogenetic Signal with CloudForest

https://doi.org/10.1145/3437359.3465605

Wagner, Reid; Toups, Benjamin S.; Deng, Zhifeng; Gallivan, Kyle A.; Brown, Jeremy M.; Wilgenbusch, James C. (July 2021, Proceedings of the Practice and Experience on Advanced Research Computing)

A central focus of evolutionary biology is inferring the historical relationships among species and using this context to learn about how evolution has shaped diverse organisms. These historical relationships are represented by phylogenetic trees, and the methods used to infer these trees have been an active area of research for several decades. Despite this attention, phylogenetic workflows have changed little, even though extraordinary advances have occurred in the scale and pace at which genomic data have been collected in the past 20 years. Modern phylogenomic datasets have also raised fascinating new questions. Why do different parts of a genome often support different relationships among species? How are these different signals distributed across chromosomes? We developed a new computational framework, CloudForest, to tackle such questions. CloudForest is flexible, efficient, and tightly integrates a diverse set of tools. Here, we briefly describe the architecture of CloudForest, including the advantages it provides, and use it to investigate the distribution of phylogenetic signal along the entire X chromosome of 24 cat (Felidae) species.
more » « less
Full Text Available
Lessons learned from organizing and teaching virtual phylogenetics workshops

https://doi.org/10.18061/bssb.v1i2.8425

Barido-Sottani, Joëlle; Justison, Joshua A.; Borges, Rui; Brown, Jeremy M.; Dismukes, Wade; Do Rosario Petrucci, Bruno; Guimarães Fabreti, Luiza; Höhna, Sebastian; Landis, Michael J.; Lewis, Paul O.; et al (June 2022, Bulletin of the Society of Systematic Biologists)

No abstract available.
more » « less
Full Text Available

Search for: All records